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Sparsely Sampling the Sky: A Bayesian Experimental Design 
Approach 
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ABSTRACT 

The next generation of galaxy surveys will observe millions of galaxies over large 
volumes of the universe. These surveys are expensive both in time and cost, raising 
questions regarding the optimal investment of this time and money. In this work we in- 
vestigate criteria for selecting amongst observing strategies for constraining the galaxy 
power spectrum and a set of cosmological parameters. Depending on the parameters of 
interest, it may be more efficient to observe a larger, but sparsely sampled, area of sky 
instead of a smaller contiguous area. In this work, by making use of the principles of 
Bayesian Experimental Design, we will investigate the advantages and disadvantages 
of the sparse sampling of the sky and discuss the circumstances in which a sparse 
survey is indeed the most efficient strategy. For the Dark Energy Survey (DES), we 
find that by sparsely observing the same area in a smaller amount of time, we only 
increase the errors on the parameters by a maximum of 0.45%. Conversely, investing 
the same amount of time as the original DES to observe a sparser but larger area of 
sky we can in fact constrain the parameters with errors reduced by 28%. 

Key words: cosmology 



1 INTRODUCTION 

The measurements of the cosmological parameters heav- 
ily rely on accurate measurements of power spectra. Power 
spectra describe the spatial distribution of an isotropic ran- 
dom field, defined as the Fourier transform of the spatial 
correlation function. The perturbations in the universe can 
be described statistically using the correlation function £(r) 
between two points, which depends only on their separation 
r (when isotropy is assumedjj; 



e(r) = <5(x)5(x + r )) 



(1) 



where <5(x) = (p(x) — p) jp measures the continuous over- 
density, where p(x) is the density at position x and p is 
the average density. The power spectrum P(k), which is 
the Fourier transform of the correlation function, is enough 
to define the perturbations completely when the perturba- 
tions are assumed uncorrelated Gaussian random fields in 
the Fourier space. Power spectra (or correlation functions) 
are what the surveys actually measure, from which cosmo- 
logical parameters are inferred. These spectra are normally 
a convolution of the primordial power spectrum (which mea- 
sures the statistical distribution of perturbations in the early 
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universe) and a transfer function which depends on the cos- 
mological parameters. Hence accurate measurements of the 
power spectra from surveys are very important for accurate 
measurements of the cosmological parameters. 

The most important observed spatial power spectrum 
for cosmology is the galaxy power spectrum; the Fourier 
transform of t he galaxy corr elation function, which was first 
formulated bv lPeeblesl (1 19731 ) . A galaxy survey lists the mea- 
sured positions of the observed galaxies. As proposed by 
Peebles, these positions are modelled as a random Poisso- 
nian point source, where the galaxy density is modulated by 
the fluctuations in the underlying matter distribution and 
the selection effects. The selection function of the survey is 
described by n(x), which is the expected galaxy density at 
position x in the absence of clustering. The fluctuations in 
the underlying matter density are given by <5(x), as described 
previously. The the galaxy number over-density n(x), which 
is the observed quantity, is re lated to the matter over-density 
via the bias b (Ka iserlTl984l ) — galaxies trace dark matter 
up to this b factor. We define the galaxy power spectrum 
P g (k) as 



P g (k) = 2tt • b 2 (k) ■ k ■ T 2 (k) ■ P p (k) , 



(2) 



where P P {k) is the primordial power spectrum P p (k) = 
Asfc™'" 1 . The transfer function T(k) further depends upon 
the cosmological parameters (e.g., the matter density fl m , 
the scalar spectral index, n s , etc.) responsible for the evo- 
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lution of the universe. The bias b relates the galaxy power 
spectrum to the matter power spectrum, as explained above. 

This power spectrum is very rich in terms of constrain- 
ing a large range of cosmological parameters. On large scales 
this spectrum probes structure which is less affected by clus- 
tering and evolution. Hence these scales are still in the lin- 
ear regime and have a "memory" of the initial state. The 
information from these regimes are, therefore, the cleanest 
since the Big Bang and any knowledge on these large scales 
would shed light on the physics of early universe and hence 
the primordial power spectrum. On intermediate scales the 
spectrum provides us with information about the evolution 
of the universe since the Big Bang; for example the matter- 
radiation equality which is responsible for the peak of the 
galaxy spectrum. The matter-radiation equality is a unique 
point in the history of the evolution, giving information 
about the amount of matter and radiation in the universe. 
On relatively small scales there is a great deal of information 
about galaxy clustering via the Baryonic Acoustic Oscilla- 
tions (BAO) which encode a characteristic scale; the sound 
horizon at the time of recombination. Therefore, measuring 
the galaxy power spectrum on a large range of scales can 
help us constrain the cosmological parameters responsible 
for the evolution of the universe as well as the ones of its 
initial state. 

Accurate measurements of the galaxy power spectrum 
depend on two main factors; the Poisson noise and the 
cosmic variance. To overcome the Poisson noise, surveys 
aim to maximise the number of galaxies observed. The im- 
pressive constraints on cosmological paramet ers from pre- 
vious and curren t surveys, such as the 2dF JCroom et al.l 
|2004 ) and SDSS jAdelman-McCarthv et all 120081 ') . has mo- 
tivated even more ambitious future surveys s uch as DES 
The Dark Energy Survey Collaboration! 120051 ) and Euclid 
Laureiid 120091 ). aiming to observe millions of galaxies over 



large volumes of the universe. Considering the large invest- 
ments in time and money for these surveys, one wants to ask 
what is really the optimal survey strategy! In this work we 
want to investigate this exact questions and find the optimal 
strategy for galaxy surveys such as DES and Euclid. 

In this era of cosmology where the statistical errors have 
reduced greatly and are now comparable with systematics, 
observing, for example, a greater number of galaxies may 
not necessarily improve our results. We need to devise more 
strategic ways to make our observations and take control of 
our systematics. For example, to investigate larger scales, 
it may be more efficient to observe a larger, but sparsely 
sampled, area of sky instead of a smaller contiguous area. 
In this case we would gather a larger density of states in 
Fourier space, but at the expense of an increased correlation 
between different scales — aliasing. This would smooth out 
features on these scales and decrease its significance if any 
observed. Here, by making use of Bayesian Experimental 
Design we will investigate the advantages and disadvantages 
of the sparse sampling and verify if a complete contiguous 
survey is indeed the most efficient way of observing the sky 
for our purposes. The parameter of interest here is the galaxy 
power spectrum itself and a set of cosmological parameters 

that depend on this spectrum. 

Some p revious work on spa r se samp li ng in cludes [Kaiser 

1 19861 ) and iBlake et al.1 (|2006h ; iKaiseri (| 19861 ) shows that 
measuring the large scale correlation function from a com- 



plete magnitude-limited redshift survey is actually not the 
most efficient approach. Instead, sampling a fraction of 
galaxies randomly, but to a fainter magnitude limit, will 
improve the constraints of the correlation function mea- 
sure ments significantly, for the same amount of observing 
time. IBlake et al.l (|2006l ) have shown that a sparse-sampling 
(achieved by a non-contiguous telescope pointings or, for a 
wide-field multi-object spectrograph, by having the fibres 
distributed randomly across the field-of-view) is preferred 
when the angular size of the sparse observed patches is much 
smaller than angular scale of the features in the power spec- 
trum (the acoustic features). 



2 BAYESIAN EXPERIMENTAL DESIGN AND 
FIGURE-OF-MERIT 

Bayesian methods have recently been used in cosmology 
for model comparison and for deriving posterior probability 
distributions for parameters of different models. However, 
Bayesian statistics can do even more by handling questions 
about the performa nce of future expe r iments, based on our 
current k nowledge jLiddle et al.ll2006l ; [Trottall2007al lbT). For 
example, IParkinson et al.f "( 2007 ) use a Bayesian approach 
to constrain the dark energy parameters by optimising the 
Baryon Acoustic Oscillations (BAO) surveys. By searching 
through a survey parameter space (which includes param- 
eters such as redshift range, number of redshift bins, sur- 
vey area, observing time, etc.) they find the optimal survey 
with respect to the dark energy equation-of-state param- 
eters. Here we will use this strength of Bayesian statistics 
for optimising the strategy to observe the sky for galaxy sur- 
veys. There are three requirements for such an optimisation; 
1. specify the parameters that define the experiment which 
need to be optimised for an optimal survey; 2. specify the 
parameters to constrain, with respect to which the survey is 
optimised; 3. specify a quantity of interest, generally called 
the figure of merit (FoM), associated with the proposed ex- 
periment. The choice of the FoM depends on the questions 
being asked, as will be explained later in the text. We then 
want to extrimise the FoM subject to constraints imposed 
by the experiment or by our knowledge about the nature of 
the universe. Below, we will explain the procedure. 

Assume e denotes the different experimental designs 
that we can implement and M 1 are the different models 
under consideration with their parameters 9*. Assume that 
experiment o has been performed, so that this experiment's 
posterior P(6\o) forms our prior probability function for the 
new experiment. The FoM will depend on the set of param- 
eters under investigation, the performed experiment (data) 
and the characteristics of the future experiment; U(8,e,o). 
From the utility we can build the expected utility E [U] as 



E[U\e,o] = '£P(M i \o) J 



dfr U(§ i ,e,o)P{8 i \o,M l ) , (3) 



where 6 l represent the fiducial parameters for model M\ 
This says: If a set of fiducial parameters, 0, correctly describe 
the universe and we perform an experiment e, then we can 
compute the utility function for that experiment, U(8, e, o). 
However, our knowledge of the universe is described by the 
current posterior distribution P(6\o). Averaging the utility 
over the posterior accounts for the present uncertainty in the 
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parameters and summing over all the available models would 
account for the uncertainty in the underlying true model. 
The aim is to select an experiment that extremises the utility 
function (or its expectation). The utility function takes into 
account the current models and the uncertainties in their 
parameters and, therefore, extremising it takes into account 
the lack of knowledge of the true model of the universe. 

One of the common choices for the FoM is some form of 
function of the Fisher matrix, which is the expectation of the 
inverse covariance of the parameters in the Gaussian limits 
(We will explain in the next section how a Fisher matrix is 
obtained in more detail.). One can refer to the Dark Energy 
Task Force (DETF) FoM, that use Fisher-matrix techniques 
to investigate how well each model experiment would be able 
to restrict the dark energy parameters wo, w a , Q.de for their 
purposes. Three common FoMs, which we will be using as 
well, are 

• A-optimality = log(trace(F)) 

trace of the Fisher matrix (or its log) and is proportional to 
sum of the variances. This prefers a spherical error region, 
but may not necessarily select the smallest volume. 

• D-optimality = log (|F|) 

determinant of the Fisher matrix (or its log), which measures 
the inverse of the square of the parameter volume enclosed 
by the posterior. This is a good indicator of the overall size 
of the error over all parameter space, but is not sensitive to 
any degeneracies amongst the parameters. 

• Entropy (also called the Kullback-Leibler divergence) 



dO P(9\e,e,o) lo{ 



P(O\0,e,o) 
P(6\o) 



= - [log |F| - log |n| - trace(I - IIF 1 )] , (4) 

where P(8\8,e, o) is the posterior distribution with Fisher 
matrix F and P(8\o) is the prior distribution with Fisher 
matrix II. The entropy forms a nice compromise between 
the A-optimality and D-optimality. Note that these are the 
utility functions, not the 'expected' utility functions. In our 
current models of the universe, we do not expect a signifi- 
cant difference between the parameters of the same model. 
However, this will be investigated in a future work, where 
we will explicitly use expected utility functions. In the next 
section we will explain how a Fisher matrix is formulated. 



3 FISHER MATRIX ANALYSIS 

The Fisher matrix is generally used to determine the sensi- 
tivity of a particular survey to a set of parameters and has 
been largely used for optimisation (and forecasting). Con- 
sider the likelihood function for a future experiment with ex- 
perimental parameters e, C(9\e) = P(Dg\9, e), where Dg are 
simulated data from the future experiment assuming that 
9 are the true parameters in the given model. We Taylor 
expand the log-likelihood around its maximum value: 

^InT 

2 4^""" " ' ' 86,88 i 



ln£(0|e) = ln£(0 ML ) + i]T(0 I -0f L ) 



9 3 ~6f L ) 



(5) 

where the first term is a constant and only affects the height 
of the function, the second term describes how fast the likeli- 



hood function falls around the maximum. The Fisher matrix 
is defined as the ensemble average of the curvature of the 
likelihood function £ (i.e., it is the average of the curvature 
over many realisations of signal and noise); 



= <^> = (- 



d 2 ln£ 
' 89,08, 



itracelCiC-'CjC" 1 ] 



(6) 
(7) 



where the second line is appropriate for a Gaussian distri- 
bution with correlation matrix C determined by the param- 
eters 9i, and C is the likelihood function. The inverse of the 
Fisher matrix is an approximation of the covariance matrix 
of the parameters, by analogy with a Gaussian distribution 
in the Si, for which this would be exact. The Cramer-Rao in- 
equality states that the smallest frequentist error measured, 
for 9i, by any unbiased estimator (such as the maximum like- 
lihood) is 1 A/ Fa and \f (F~ l )u, for non-marginalised and 
marginaliseqj one-sigma errors respectively. The derivatives 
in Equation [6] generally depend on where in the parame- 
ter space they are calculated and hence it is clear that the 
Fisher matrix is function of the fiducial parameters. 

The Fisher matrix allows us to estimate the errors on 
parameters without having to cover the whole parameter 
space (but of course will only be appropriate so long as the 
derivatives are roughly constant throughout the space). So, 
a Fisher matrix analysis is equivalent to the assumption of a 
Gaussian distribu tion about the peak of the likelihood (e.g. 
iBond et al.lll998h . It also makes the calculations easier. For 
example, if we are only interested in a subset of parame- 
ters, then marginalising over unwanted parameters is just 
the same as inverting the Fisher matrix, taking only the 
rows and columns of the wanted parameters and inverting 
the smaller matrix back. It is also very straightforward to 
combine constraints from different independent parameters: 
we just sum over the Fisher matrices of the experiments (re- 
member Fisher matrix is the log of the likelihood function). 

We further note, as in all uses of the Fisher matrix, that 
any results thus obtained must be taken with the caveat 
that these relations only map onto realistic error bars in the 
case of a Gaussian distribution, usually most appropriate 
in the limit of high signal-to-noise ratio and/or relatively 
small scales, so that the conditions of the central limit theo- 
rem obtain. As long as we do not find extremely degenerate 
parameter directions, we expect that our results will cer- 
tainly be indicative of a full analysis, using simulati ons and 
techni ques such as Bayesian Experimental Design (|Trottal 
l2007ch . 

3.1 Fisher Matrix for Galaxy Surveys 

We follow the approach of iTegmarkl l| 19971 ) to define the 
pixelisation for galaxy surveys. First we define the data in 
pixel i as 



A, = 



j dPxipi (x) 



n(x) ■ 



(8) 



2 It should be noted that the Cramer-Rao inequality is a state- 
ment about the so-called "Frequentist" confidence intervals and is 
not strictly applicable to "Bayesian" errors. 

3 Integration of the joint probability over other parameters. 



4 P. Paykari and A. H. Jaffe 



where n(x) is the galaxy density at position x and n is the 
expected number of galaxies at that position. The weighting 
function, ipi(x), which determines the pixelisation (and is 
sensitive to the shape of the survey as you will see later), is 
defined as a set of Fourier pixels 



V>;(z) = 



V 



1 x inside survey volume 
otherwise 



(9) 



where V is the volume of the survey. Here we have divided 
the volume into sub-volumes, each being much smaller than 
the total volume of the survey, but being large enough to 
contain many galaxies. This means A, is the fractional over- 
density in pixel i. Using this pixelisation we can define a 
covariance matrix as 



(AiA*) = C = (Cs)a + (Cjv)ij 



(10) 



where Cs and Cjv are the signal and noise covariance matri- 
ces respectively and are assumed independent of each other. 
The signal covariance matrix can be defined as 

(Csh = (A, A*) 



d 3 xd s x' ipi (jt)?pj(x) 
n(x) — n n(x') — n 



(11) 



By equating the number over-density (n(x) — n) /n to the 
continuous over-density <5(x) = (p(x) — p) /p we obtain 



(Cs) 



d A k 
(2*) 

dk 
(2n) 



-k 2 p(k) I dn k Mk)^(t) 



- f 



dk 



(12) 



where ipi(k) is the Fourier transform of ipi(x) and the win- 
dow function Wij(k) is defined as the angular average of 
the square of the Fourier transform of the weighting func- 
tion. With the same approach, the noise covariance matrix 
— which is due to Poisson shot noise — is given by 



(Cjv)ij 



(NiNj) . 

\ ■* / Noise 

J d 3 xd 3 x'ipi (x) t/jj —5d (x — x') 



d 3 k 1 - 



(2tt) 3 n 



n J (2-kY 



(13) 



The design of the survey will shape the form of the weighting 
function in Equation which will be discussed in the next 
section. 

This prescription gives us a data covariance matrix for 
a galaxy survey. What we actually need is a Fisher matrix 
for the parameters we are interested in. For this we will 
use Equation [6] above, which defines the Fisher matrix of 
parameters in terms of the inverse of the data covariance 
matrix and its differentiation with respect to the parameters 
of interest. We are interested in the galaxy power spectrum 



and hence the differentiation of the covariance matrix in 
Equation [6] is taken with respect to the bins of this power 
spectrum. As the noise covariance matrix does not depend 
on the power spectrum, we only need to differentiate the 
signal covariance matrix in Equation 1121 Taking the galaxy 
power spectrum as a series of top-hat bins 



P(k) = ^w B {k)P B 



w B = 1 k e B 
otherwise 



(14) 



where Pb is the power in each bin, the differentiation takes 
the form 



djCsh f kB _dk_ 
8P(k) 7_ in (2ny 



k 2 Wij(k). 



(15) 



We insert this and the inverse of the data covariance 
matrix into Equation[6]to get a Fisher matrix for the galaxy 
power spectrum bins. To get a Fisher matrix for the cosmo- 
logical parameters one can use the parameters Jacobian 



p dP a 8P b 



(16) 



where F a b is the galaxy spectrum Fisher matrix and F a p is 
the Fisher matrix for the cosmological parameters X a and 



4 SURVEY DESIGN 

We will investigate the FoM of a sparse design to that of 
a contiguous survey, which we have chosen to be similar to 
that of the Dark Energy Survey (DES). 



4.1 Dark Energy Survey (DES) 

The Dark Energy Su rv ey (DESfl 

(|The Dark Energy Survey Collaboration! 120051 ) is de- 
signed to probe the origin of the accelerating universe and 
help uncover the nature of dark energy. Its digital camera, 
DECam, is mounted on the Blanco 4-meter telescope at 
Cerro Tololo Inter-American Observatory in the Chilean 
Andes. Starting in December 2012 and continuing for 
five years, DES will catalogue 300 million galaxies in the 
southern sky over an area of 5000 square degrees and a 
redshift range of 0.2 < z < 1.3. In the next section we will 
explain how we 'sparsify' the DES survey for our purposes. 

Here, we use a flat-sky approximation. Euclid, with a 
survey area of 20, 000 square degrees should be treated on 
the full sky and is not investigated here. Nonetheless we 
expect qualitatively similar results to DES. 



4.2 Sparse Design 

For simplicity, we will design the sparsely sampled area of 
the sky as a regular grid of n p x n p square patches of size 
M x M — Figure [T] We therefore define the structure on 



http : //www . darkenergysurvey . org/ 
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Figure 1. Design of the mask on the sky to sparsely sample the 
sky. A regular grid with n patches of size M (note that we are 
observing through these patches — white squares in the Figure), 
placed at constant distances from one another at x, and yj. The 
total observed area is the sum of the areas of all the patches, 
n X M 2 , and total sampled area is the total area which bounds 
both the masked and the unmasked areas, V. Hence the fraction 
of sky observed is / = (n X M 2 )/ Atot- Also, note that we are 
assuming a flat-sky approximation. 



With this design the weight function in equation[9]takes the 
form: 



Mk) 



do % 
x e 

n(z - x n ) u (y - y™) x 

n m 

e(. + |)e(!-.)xl 

J m 

j dze 



sine ( qj^- ) 2 cos (q x x n ) 



L\ ML 
q >2 X — 



(20) 



where q = k^ — k, q m = gsinf?cos0, q y — gsin#sin0, q z = 
qcos(f> and d/j, = dcosO. The volume V is the total sparsely 
sampled volume, M is the size of the observed patch on the 
surface of the sky and L is the observed depth. The last 
equality in the above equation uses the Dirichlet Kernel 



D„(x)= e kx = l + 2j2cos{ka 



(21) 



which can be used due to the symmetry of the design. The 
window function, defined in Equation ll2l now takes the form 



the sky as a top-hat in both x and y directions 

1 < \x - x n \ < M/2 

otherwise 

1 < \y-y m \ < M/2 



n 



y m ) 



otherwise 



(17) 



(18) 



where Xi and y-, mark the centres of the patches in our co- 
ordinate system. In the z direction we use the step function, 
which is defined as: 



Q(z) 



1 z > 

otherwise 



(19) 



W^k) 



1 dfj, f 27V dcf> 



-1 2 J 27T 



Hki-kWiki-k) 



1 dp, f 27T d<f> ( M 2 L X 2 



1 2 J 2tt V V 



sine lq x ^-) ^ 2 cos(q x x„ 



sine [q'x^-j ^22cos(q' x : 



M 



sine ( q v — j ^ 2 cos(qy m ) 



sine [q' y ^-) J2 2cos (l'y'i 



2 / KlyUm' ) X 



, L 



smc I q z — I sine . 



(22) 



Note that there are two scales that control the behaviour 
of the window function; one is the size of the patches, M, 
and the other is their distance from one another, Xi. We will 
investigate the influence of both of these scales on the FoM 
by trying two different configurations, discussed in the next 
section. In case of the contiguous sampling of the sky where 
we are observing through a contiguous square, the window 
function takes the form of one single big patch, as shown 
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below 

W tj (k) 



_! T Jo 2^ 

M 



smc 



smc I q y 



M 



, M 
smc I q x — 

,M 
smc I q y 

,L 



(23) 



which is a square cylinder. 



4.3 Sparsifying DES 

We divide the total area of DES into small square patches, 
as explained in the design of the mask previously. There are 
two ways to sparsify this area; 

• Constant Total Area (full sampled area stays constant) 
In this setting we keep the patches at a constant position and 
gradually decrease their size. Therefore, the total samp/edQ 
area is kept constant, while the total observed area decreases 
as the patch sizes decrease. The patches are placed at 60Mpc 
from one another; this scale is about half of the scale of the 
BAO Scales, which is ~ 120Mpc. The patches are placed 
at half this scale to capture the BAO features at best. This 
restricts the maximum size of the patches to be 60Mpc for 
/ = 1. We then shrink them from 60Mpc to lOMpc. The 
minimum size of lOMpc was chosen to avoid entering the 
non-linear physics at < lOMpc. This configuration is shown 
in Figure [2] In this case, as we make our observations more 
sparse, the total observing time decreases as well; we could 
instead choose to observe more deeply in the same amount 
of time and gain volume in the redshift direction. 

• Constant Observed Area (footprint of the survey stays 
constant) 

In this setting the size of the patches are kept fixed at 
60Mpc, and the area is sparsified by placing the patches fur- 
ther and further from one another. Here the total observed 
area is constant, while the total sampled area increases as 
the patches are put further and further. This configuration 
is shown in Figure [3] Now, the length of time for the survey 
remains the same, but is spread out over a larger area of sky. 

Note that the areas we consider here are small enough 
that the flat sky approximation is valid. Also note that in all 
the above setting we keep the number of bins of the galaxy 
power spectrum constant at ribin = 60. In reality we should 
let the total volume of the survey choose the binning of the 
power spectrum via k m m = (2tt/V) 1 ^ 3 = dk, and hence 
the number of the bins ribm- However, if nu n changes from 
case to case it will be unfair to compare D-optimality and 
Entropy as they will have different units as nun changes. 
To have a fair comparison between the cases we keep ribin 
constant. 



° This is the total area including both the masked and unmasked 
areas. 



5 RESULTS 

We have chosen a geometrically flat ACDM model with 
adiabatic perturbations. We have a five-parameter model 
with the following values for the parameters: Q m — 0.214, 
Q b = 0.044, A = 0.742, r = 0.087 and h = 0.719, where 
H = lOO/ikm^Mpc- 1 . The FoM used are 

Entropy = [in |F| - In | IX| - trace(I - IIF -1 )] 

x0.5, (24) 

A-optimality = ln(trace(F)) , (25) 

D-optimality = m(|F|) , (26) 

where II is the prior Fisher matrix, which we have chosen 
to be that for a SDSS-LRG-like survey. The posterior Fisher 
matrix is F = L+II, where L is the likelihood Fisher matrix, 
which is the current sparse survey we have designed. The 
utility functions above are defined so that they need to be 
maximised for an optimal design. 



5.1 Constant Total Area 

Figures 2] shows the FoM for both the galaxy power spec- 
trum bins on the left and the cosmological parameters on 
the right. In both cases, the Entropy, A-optimality and D- 
optimality all increase with /. This is as expected as a con- 
tiguous sampling of the sky captures all the information and 
should be the best to constrain cosmology. The top panels 
in the Figure show A-optimality for the bins on the left and 
the cosmological parameters on the right. In both cases, A 
increases with / and reaches its maximum at / = 1 for DES. 
Note that A-optimality is a measure of the errors of the pa- 
rameters only — it is a measure of the trace of the Fisher 
matrix. Therefore, it is does not account for the correlations 
between parameters. Although A increases with / for both 
the bins and the parameters, note that this increase is very 
small. To see the amount of change in each of the elements 
of the power spectrum Fisher matrix as / increases, look at 
the top panel of Figure [5] This shows the diagonal elements 
of the Fisher matrix F for galaxy power spectrum bins for 
the different /. The elements are all on top of each other 
and indeed the gain obtained by increasing / is very small. 

The middle panels of Figure|4]show D-optimality, which 
again increases with / for both the bins and the parameters. 
Note that, D-optimality is a measure of the determinant of 
the Fisher matrix and therefore takes the correlation be- 
tween the parameters into account. The correlation between 
the parameters is indeed very important; one disadvantage 
of the sparse sampling is the correlation it induces between 
the parameters due to aliasing. To see this effect, look at the 
bottom panel of Figure[Sl where the row of the Fisher matrix 
that corresponds to the middle bin of the power spectrum 
is shown. Going away from the peak in both direction, the 
elements show the correlation between the different bins and 
the middle one. As / decreases and we get more and more 
sparse, the power in the off-diagonal elements of the Fisher 
matrix increases, meaning there is more aliasing. The DES 
survey, as a full contiguous survey, has the least aliasing, 
while the sparsest survey has the most. The rise towards 
the small k (large scales) is due to sample variance. 

Looking at the correlations and the errors in the Fisher 
matrix of the spectrum one notes that the decrease in D- 
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Figure 2. Survey geometry for the 'constant total area' scenario — section [5,11 In this setting we keep the patches at a constant position 
and gradually decrease their size. Therefore, the total sampled area (i.e., the total extent of the survey) is kept constant, while the total 
observed area (and hence the survey observing time) decreases as the patch sizes decrease. 



optimality for sparser surveys is mostly due to the increased 
correlation between the bins rather than the the increased 
errors; as we saw in the top panel of this Figure the decrease 
in the errors are negligible. In general we conclude that to- 
tal aliasing induced by sparsity is small and the loss in the 
constraining power of the survey due to this aliasing is neg- 
ligible. Hence, overall, little is gained by observing the sky 
more contiguously. 

The bottom panels in Figure U show the Entropy for 
the bins and the parameters. Again, E increases with / and 
reaches its maximum for DES. The Entropy measures the 
total size of the errors of the parameters in the Fisher matrix 
as well as their correlation. Hence it is a good compromise of 
A- and D-optimality. It measures the total information gain 
of the survey relative to a prior survey. Having an SDSSTike- 
survey as our prior, and taking into account both the errors 
and the correlation between the parameters, the contiguous 
DES survey has the largest gain compared to the sparse 
surveys. However, note that this gain is again very small. 

Figure [6] shows the relative loss in the marginalised er- 
rors of each of the cosmological parameters with respect to 



DES. The largest loss for a sparse observation of the sky is on 
the spectral index with 8Q.a/Q.a ~ 0.45% and the smallest is 
for Q c with a loss of 8Q. c /^l c ~ 0.15%. The non-marginalised 
errors show a qualitatively different behaviour, where n s has 
the largest and Ha has the smallest loss. 

5.2 Constant Observed Area 

Figure [7] shows the FoM for the power spectrum bins and 
the cosmological parameters. In this case the Entropy, A- 
optimality and D-optimality all decrease with /. And the 
overall changes in all the FoM are much larger than the 
ones seen in the previous scenario for both the bins and the 
parameters. 

The top panel of Figure [8] shows the diagonal elements 
of the Fisher matrix of the bins. As we sparsify the sur- 
vey these elements increase, and hence better constrain the 
spectrum. The bottom panel in the Figure shows the row of 
the Fisher matrix that corresponds to the middle bin of the 
spectrum. Going away from the peak, the elements show the 
correlation between the different bins and the middle one. 
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Figure 3. Survey geometry for the 'constant observed area' scenario — section[5]2] In this setting the size of the patches are kept fixed 
at 60Mpc, and the area is sparsified by placing the patches further and further from one another. Here the total observed area (and 
hence the survey observing time) is constant, while the total sampled area (i.e., the total extent of the survey) increases as the patches 
are put further and further. 
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Figure 4. 'Constant total area' — Figure of Merit for galaxy power spectrum bins on the left and cosmological parameters on the right. 
In both cases, the Entropy, A-optimality and D-optimality all increase with /. This is as expected as a contiguous sampling of the sky 
captures all the information and should be the best to constrain cosmology. However, note that the increase is indeed very small. In 
general we conclude that the loss in the constraining power of the survey due to sparsity is negligible and, overall, little is gained by 
observing the sky more contiguously. Therefore, the sparse surveys seem like a good substitute for the contiguous surveys, with less 
observing time and less cost. 



For DES the middle bin has a correlation with the close 
neighbouring bins. However, the correlation decreases as we 
go away from the peak. Towards small k (large scales) it 
starts to increase again due to sample variance. As / de- 
creases and we get more and more sparse, the middle bin 
has a sharper drop (due to the larger total size of the sur- 



vey) i.e., less correlation with neighbouring bins. However, 
there is more aliasing between distant bins. Also, there are 
peaks (i.e., larger correlations) at certain scales which are 
related to the distances between the patches, which changes 
case by case. The DES survey, as a full contiguous survey, 



Sparsely Sampling the Sky: A Bayesian Experimental Design Approach 9 



Constant Total Area — Power Spectrum 




0.00 0.02 0.04 0.06 0.08 0.10 

k/hMpcT 1 



Figure 5. 'Constant total area' — Top panel shows the diagonal elements of the Fisher matrix for different / for the power spectrum 
bins. The increase in these elements (which translates into a decrease in the variance) is indeed very negligible as sparsity increases. 
Bottom panel shows the row of the Fisher matrix that corresponds to the middle bin of the power spectrum. Going away from the peak 
in both direction, these elements show the correlation between the different bins and the middle one. As / decreases and we get more 
and more sparse, the power in the off-diagonal elements of the Fisher matrix increases, meaning there is more aliasing between the bins. 
The DES survey, as a full contiguous survey, has the least aliasing, while the sparsest survey has the most aliasing. The uniform increase 
at low k, large scales, is due to sample variance. 



has indeed the least aliasing, while the sparsest survey has 
the most. 

Note that in this case the sparsity is obtained by plac- 
ing the observed patches further and further away from each 
other. As the sparsity increases as the patches are placed fur- 
ther, the total size of the survey is greatly increased, which 
seems to make up for the aliasing that the sparse design has 
induced. Overall we gain a great deal by spending the same 
amount of time on larger but sparsely sampled area. 

Figure [9] shows the relative gain in the marginalised 
errors of each of the cosmological parameters with respect 
to DES. The largest gain for a sparse observation of the sky 
is on S1a with 5Qa/Qa ~ 27% and the smallest is for fi c 



with a gain of 8Q. c /0.c ~ 7%. Again, a qualitatively different 
scenario is seen for the non-marginalised errors; has the 
largest gain due to sparsity, and h has the smallest. 



6 CONCLUSION 

In this work we have investigated the advantages and dis- 
advantages of sparsely sampling the sky as opposed to a 
contiguous observation. By making use of Bayesian Experi- 
mental Design, we have defined our Figure of Merit as dif- 
ferent functions of the Fisher matrix. These FoM capture 
different aspects of the parameters of interest such as their 
overall variance, the correlation between them or a measure 
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Figure 7. 'Constant observed area' — Figure of Merit for galaxy power spectrum bins on the left and cosmological parameters on the 
right. In this case the Entropy, A-optimality and D-optimality all decrease with /. And the overall changes in all the FoM are much 
larger than the ones seen in the previous scenario. It seems that the increase in the total size of the survey due to sparsity can make up 
for the aliasing that the sparse design induces. Overall we gain a great deal by spending the same amount of time on larger but sparsely 
sampled area. Note that the / = 0.07 case is only for illustration purposes as it covers an area larger than the area of the sky. 
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Figure 6. 'Constant total area' — Relative change in the errors 
of the cosmological parameters. The largest loss is about 4.5% 
due to sparsifying the survey. 



of both as in Entropy. By optimising these functions we in- 
vestigate an optimal survey design for estimating the galaxy 
power spectrum and a set of cosmological parameters. We 



have compared a series of sparse designs to a contiguous de- 
sign of DES. We split the area of the DES survey into small 
square patches and sparsify the survey in two ways: 

(i) by shrinking the size of the patches while they are 
kept at a constant position. In this case the total sampled 
area of the survey is constant while the observed area (and 
the survey observing time) shrinks. This means the total in- 
formation gained from the survey reduces in each case. In 
this scenario all the three FoM (A-optimality, D-optimality 
and Entropy) increase with /, both for the power spectrum 
bins and the cosmological parameters. This is expected as a 
contiguous sampling should capture all the information and 
constrain cosmology the best. However, we note that this 
increase with decreasing sparsity is very small for both the 
bins and the cosmological parameters. Looking at the vari- 
ance and the covariance of the parameters, we note that the 
slight degrading of the surveys due to sparsity is mostly be- 
cause of the increased correlation between the bins — alias- 
ing — rather than the the increased errors. In general we 
conclude that total aliasing induced by sparsity is small and 
the loss in the constraining power of the survey because of 
it is negligible. Hence, overall, little is gained by observing 
the sky more contiguously. Indeed the largest loss in terms 
of the errors of the cosmological parameters is of the order 
of ~ 4.5% in the sparsest case. 

(ii) by keeping the size of the patches constant, but plac- 
ing them further and further from one another. In this sce- 
nario the observed area (and observing time) is kept con- 
stant, while sparsifying means larger and larger total sam- 
pled area. This means the total information gained from the 
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Figure 8. 'Constant observed area' — Top panel shows the diagonal elements of the Fisher matrix for different / for the power spectrum 
bins. As we sparsify the survey these elements increase, hence better constrain the spectrum. The bottom panel shows the row of the 
Fisher matrix that corresponds to the middle bin of the spectrum. Going away from the peak, the elements show the correlation between 
the different bins and the middle one. For DES the middle bins has a correlation with the close neighbouring bins. But the correlation 
decreases as we go away from the peak. Towards small k, large scales, it starts to increase again due to the sample variance. As / decreases 
and we get more and more sparse, the middle bin has a sharper drop (due to the larger total size of the survey) i.e., less correlation with 
neighbouring bins. However, there is more aliasing between distant bins. Also, there are peaks (i.e., larger correlations) at certain scales 
which are related to the distances between the patches, which changes case by case. The DES survey, as a full contiguous survey, has 
indeed the least aliasing, while the sparsest survey has the most aliasing. Note that the / = 0.07 case is only for illustration purposes as 
it covers an area larger than the area of the sky. 



survey in each case is the same. Therefore, there are the 
two competing factors; one is the increase in the total sam- 
pled area as the survey is sparsified and the other is aliasing 
induced due to the larger and larger sparse mask on the sky. 

In this case all FoM decrease with /, and the change in 
the FoM is much larger than the ones seen in the previous 
scenario. As we sparsify the survey the decrease in errors 
makes up for the increased aliasing induced and hence cause 
a general improvement in constraining power of the survey. 
Overall we gain a great deal by spending the same amount of 
time on larger but sparsely sampled area. Indeed we gain as 



much as ~ 27% on the sparsest survey, which is a significant 
improvement. 



We conclude that sparse sampling could be a good substitute 
for the contiguous observations and indeed the way forward 
for future surveys. At least for small areas of the sky, such 
as that of DES, sparse sampling of the sky can have less cost 
and less observing time, while obtaining the same amount 
of constraints on the cosmological parameters. On the other 
hand we can spend the same amount of time but sparsely 



12 P. Paykari and A. H. Jaffe 



Constant Observed Area - Cosmological Parameters 













f = 1.00 (DES) H 












f = 0.26 












f = 0.12 












f = 0.07 












- 














- 








* 










• 






























1 


X 








: 












- 
































































f 


* 






* 










Q s 


h 



Tegmark M., 1997, Physical Review Letters, 79, 3806 
The Dark Energy Survey Collaboration 2005, ArXiv As- 
trophysics e-prints 
Trotta R., 2007a, MNRAS, 378, 72 
Trotta R., 2007b, MNRAS, 378, 819 
Trotta R., 2007c, MNRAS, 378, 819 



Figure 9. 'Constant observed area' — Relative change in the 
errors of the cosmological parameters. The largest gain is about 
27% by sparsifying the survey. Note that the / = 0.07 case is only 
for illustration purposes as it covers an area larger than the area 
of the sky. 



observe a larger area of the sky. This greatly improves the 
constraining power of the survey. 

In this work we have chosen square observation patches, 
which may be the worst shape in terms of the correlation 
they induce. Yet another constraint in this design is the 
fixed and determined positions of the patches which cause 
a loss of information at certain scales. The advantage of 
this approach has been its analytical formalism, which has 
made it possible to understand the important factors in the 
sparse sampling. For future work we will investigate an op- 
timal shape foe the patches and have a numerical approach 
where these patches are randomly distributed on the sky. 
This causes an even loss of information on all scales and is 
expected to improve results greatly. 
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